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Abstract 

This paper formulates and solves a sequential detection problem that involves the mutual information 
(stochastic observability) of a Gaussian process observed in noise with missing measurements. The 
main result is that the optimal decision is characterized by a monotone policy on the partially ordered 
set of positive definite covariance matrices. This monotone structure implies that numerically efficient 
algorithms can be designed to estimate and implement monotone parametrized decision policies. The 
sequential detection problem is motivated by applications in radar scheduling where the aim is to 
maintain the mutual information of all targets within a specified bound. We illustrate the problem 
formulation and performance of monotone parametrized policies via numerical examples in fly-by and 
persistent-surveillance applications involving a GMTI (Ground Moving Target Indicator) radar. 
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I. Introduction 

Consider the following sequential detection problem. L targets (Gaussian processes) are allo- 
cated priorities z/i, z/2, . . . , ul- A sensor obtains measurements of these L evolving targets with 
signal to noise ratio (SNR) for target / proportional to priority ui. A decision maker has two 
choices at each time k: If the decision maker chooses action Uk = 2 (continue) then the sensor 
takes another measurement and accrues a measurement cost c^. If the decision maker chooses 
action Uk = 1 (stop), then a stopping cost proportional to the mutual information (stochastic 
observability) of the targets is accrued and the problem terminates. What is the optimal time 
for the decision maker to apply the stop action? Our main result is that the optimal decision 
policy is a monotone function of the target covariances (with respect to the positive definite partial 
ordering). This facilitates devising numerically efficient algorithms to compute the optimal policy. 

The sequential detection problem addressed in this paper is non-trivial since the decision to 
continue or stop is based on Bayesian estimates of the targets' states. In addition to Gaussian noise 
in the measurement process, the sensor has a non-zero probability of missing observations. Hence, 
the sequential detection problem is a partially observed stochastic control problem. Targets with 
high priority are observed with higher SNR and the uncertainty (covariance) of their estimates 
decreases. Lower priority targets are observed with lower SNR and their relative uncertainty 
increases. The aim is to devise a sequential detection policy that maintains the stochastic 
observability (mutual information or conditional entropy) of all targets within a specified bound. 

Why stochastic observability? As mentioned above, the stopping cost in our sequential detec- 
tion problem is a function of the mutual information (stochastic observability) of the targets. The 
use of mutual information as a measure of stochastic observability was originally investigated in 
O. In O, determining optimal observer trajectories to maximize the stochastic observability of 
a single target is formulated as a stochastic dynamic programming problem - but no structural 
results or characterization of the optimal policy is given; see also O. We also refer to pl| where 
a nice formulation of sequential waveform design for MEMO radar is given using a KuUback- 



Leibler divergence based approach. As described in Section III-C another favorable property 
of stochastic observability is that its monotonicity with respect to covariances does not require 
stability of the state matrix of the target (eigenvalues strictly inside the unit circle). In target 
models, the state matrix for the dynamics of the target has eigenvalues at 1 and thus is not stable. 
Organization and Main Results: 

(i) To motivate the sequential detection problem. Section |Il] presents a GMTI (Ground moving 
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target indicator) radar with macro/micro-manager architecture and a linear Gaussian state space 
model for the dynamics of each target. A Kalman filter is used to track each target over the 
time scale at which the micro-manager operates. Due to the presence of missed detections, 
the covariance update via the Riccati equation is measurement dependent (unlike the standard 
Kalman filter where the covariance is functionally independent of the measurements). 

(ii) In Section [111} the sequential detection problem is formulated. The cost of stopping is the 
stochastic observability which is based on the mutual information of the targets. The optimal 
decision policy satisfies Bellman's dynamic programming equation. However, it is not possible 
to compute the optimal policy in closed formQ Despite this, our main result (Theorem [T]) shows 
that the optimal policy is a monotone function of the target covariances. This result is useful 
for two reasons: (a) Algorithms can be designed to construct policies that satisfy this monotone 
structure, (b) The monotone structural result holds without stability assumptions on the linear 
dynamics. So there is an inherent robustness of this result since it holds even if the underlying 
model parameters are not exactly specified. 

(iii) Section |IV] exploits the monotone structure of the optimal decision policy to construct finite 
dimensional parametrized policies. Then a simulation-based stochastic approximation (adaptive 
filtering) algorithm (Algorithm [TJ is given to compute these optimal parametrized policies. The 
practical implication is that, instead of solving an intractable dynamic programming problem, 
we exploit the monotone structure of the optimal policy to compute such parametrized policies 
in polynomial time. 

(iv) Section |V] presents a detailed application of the sequential detection problem in GMTI radar 
resource management. By bounding the magnitude of the nonlinearity in the GMTI measurement 
model, we show that for typical operating values, the system can be approximated by a linear 
time invariant state space model. Then detailed numerical examples are given that use the above 
monotone policy and stochastic approximation algorithm to demonstrate the performance of the 
radar management algorithms. We present numerical results for two important GMTI surveillance 
problems, namely, the target fly-by problem and the persistent surveillance problem. In both 
cases, detailed numerical examples are given and the performance is compared with periodic 

'For stochastic control problems with continuum state spaces such as considered in this paper, apart from special cases such 
as linear quadratic control and partially observed Markov decision processes, there are no finite dimensional characterizations 
of the optimal policy (5). Bellman's equation does not translate into practical solution methodologies since the state space is a 
continuum. Quantizing the space of covariance matrices to a finite state space and then formulating the problem as a finite-state 
Markov decision process is infeasible since such quantization typically would require an intractably large state space. 
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Stopping policies. Persistent surveillance has received much attention in the defense literature 
10, 0, since it can provide critical, long-term surveillance information. By tracking targets for 
long periods of time using aerial based radars, such as DRDC-Ottawa's XWEAR radar H or the 
U.S. Air Force's Gorgon Stare Wide Area Airborne Surveillance System, operators can "rewind 
the tapes" in order to determine the origin of any target of interest [7]. 

(v) The appendix presents the proof of Theorem [T| It uses lattice programming and supermodu- 
larity. A crucial step in the proof is that the conditional entropy described by the Riccati equation 
update is monotone. This involves use of Theorem [2] which derives monotone properties of the 
Riccati and Lyapunov equations. The idea of using lattice programming and supermodularity 
to prove the existence of monotone policies is well known in stochastic control, see [8J for a 
textbook treatment of the countable state Markov decision process case. However, in our case 
since the state space comprises covariance matrices that are only partially ordered, the optimal 
policy is monotone with respect to this partial order. The structural results of this paper allow 
us to determine the nature of the optimal policy without brute force numerical computation. 

Motivation - GMTI Radar Resource Management: This paper is motivated by GMTI radar 
resource management problems |l9l, ifTOl . ifTTll . The radar macro-manager deals with priority 
allocation of targets, determining regions to scan, and target revisit times. The radar micro- 
manager controls the target tracking algorithm and determines how long to maintain a priority 
allocation set by the macro-manager. In the context of GMTI radar micro-management, the 
sequential detection problem outlined above reads: Suppose the radar macro-manager specifies 
a particular target priority allocation. How long should the micro-manager track targets using 
the current priority allocation before returning control to the macro-manager? Our main result, 
that the optimal decision policy is a monotone function of the targets' covariances, facilitates 
devising numerically efficient algorithms for the optimal radar micro-management policy. 

II. Radar Manager Architecture and Target Dynamics 

This section motivates the sequential detection problem by outlining the macro/micro-manager 
architecture of the GMTI radar and target dynamics. (The linear dynamics of the target model 



are justified in Section V-A where a detailed description is given of the GMTI kinematic model). 



A. Macro- and Micro-manager Architecture 

(The reader who is uninterested in the radar application can skip this subsection.) Consider a 
GMTI radar with an agile beam tracking L ground moving targets indexed by / G 
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In this section we describe a two-time-scale radar management scheme comprised of a micro- 
manager and a macro-manager. 

a) Macro-manager: At the beginning of each scheduling interval n, the radar macro- 
manager allocates the target priority vector z/„ = (z/^, . . . , z/^). Here the priority of target / 
is z/^ G [0,1] and J2f=i^n = 1- The priority weight determines what resources the radar 
devotes to target /. This affects the track variances as described below. The choice z/„ is typically 
rule-based, depending on several extrinsic factors. For example, in GMTI radar systems, the 
macro-manager picks the target priority vector Vn+i based on the track variances (uncertainty) 
and threat levels of the L targets. The track variances of the L targets are determined by the 
Bayesian tracker as discussed below. 

b) Micro-manager: Once the target priority vector u is chosen (we omit the subscript n 
for convenience), the micro-manager is initiated. The clock on the fast time scale k (which is 



called the decision epoch time scale in Section V-A) is reset to A; = and commences ticking. 



At this decision epoch time scale, k = 0,1, . . ., the L targets are tracked/estimated by a Bayesian 
tracker. Target / with priority z/' is allocated the fraction z/' of the total number of observations 



(by integrating z/ A observations on the fast time scale, see Section V-A) so that the observation 
noise variance is scaled by l/(z/iA). The question we seek to answer is: How long should the 
micro-manager track the L targets with priority vector v before returning control to the macro- 
manager to pick a new priority vector'] We formulate this as a sequential decision problem. 

Note that the priority allocation vector v and track variances of the L targets capture the 
interaction between the micro- and macro-managers. 

B. Target Kinematic Model and Tracker 

We now describe the target kinematic model at the epoch time scale k: Let s\ = [x{, x{,yl, y[] ^ 
denote the Cartesian coordinates and velocities of the ground moving target / G {l,...,L}. 



Section V-A shows that on the micro-manager time scale, the GMTI target dynamics can be 



approximated as the following linear time invariant Gaussian state space model 



4+1 = Fsi + Gwl 

^■^k + TPl^'L with probability p^^, 

with probability 1 — p^. 

The parameters F, G, H are defined in Section |V} They can be target (/) dependent; to simplify 
notation we have not done this. In (nj), denotes a 3-dimensional observation vector of target / 
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at epoch time k. The noise processes and vl/\/i/A are mutually independent, white, zero- 
mean Gaussian random vectors with covariance matrices Q'- and i?'(z/'), respectively. (Q and 
R are defined in Section |v]). Finally, p'-^ denotes the probability of detection of target /, and 
represents a missed observation that contains no information about state sj^ 
Define the one-step-ahead predicted covariance matrix of target / at time k as 



Pi = e| (4 - E{4|4,_J) (4 - E{si\zl,_,}) 



T 



Here the superscript T denotes transpose. Based on the priority vector u and model ([T]), the co- 
variance of the state estimate of target / G {l,...,L}is computed via the following measurement 
dependent Riccati equation 

PU, = n{Pl z.pFPiF^ + I{zi ^ ^)FPiH^{HPiH^ + R^u^'hpIF^. (2) 

Here /(■) denotes the indicator function. In the special case when a target / is allocated zero 
priority (u^''^ = 0), or when there is a missing observation (z^ = 0), then ^ specializes to the 
Kalman predictor updated via the Lyapunov equation 

Pk-i = ^Pl)=FPLiF^ + Q^. (3) 

ni. Sequential Detection Problem 
This section presents our main structural result on the sequential detection problem. Section 



III-A formulates the stopping cost in terms of the mutual information of the targets being 



tracked. Section III-B formulates the sequential detection problem. The optimal decision policy 
is expressed as the solution of a stochastic dynamic programming problem. The main result 
(Theorem [T] in Section III-C) states that the optimal policy is a monotone function of the 



target covariance. As a result, the optimal policy can be parametrized by monotone policies and 
estimated in a computationally efficient manner via stochastic approximation (adaptive filtering) 



algorithms. This is described in Section IV 



Notation: Given the priority vector u allocated by the macro-manager, let a E {!,..., L} 
denote the highest priority target, i.e, a = argmax^i/'. Its covariance is denoted P"^. We use 
the notation P^" to denote the set of covariance matrices of the remaining L — 1 targets. The 
sequential decision problem below is formulated in terms of (P",P^"). 

^With suitable notational abuse, we use '0' as a label to denote a missing observation. When a missing observation is 
encountered, the track estimate is updated by the Kalman predictor with covariance update ijSj. 
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A. Formulation of Mutual Information Stopping Cost 

As mentioned in Section II-A[ once the radar macro-manager determines the priority vector u, 
the micro-manager switches on and its clock k = 1,2, . . . begins to tick. The radar micro-manager 
then solves a sequential detection problem involving two actions: At each slot k, the micro- 
manager chooses action Uk E {I (stop) ,2 (continue) }. To formulate the sequential detection 
problem, this subsection specifies the costs incurred with these actions. 

Radar Operating cost: If the micro-manager chooses action Uk = 2 (continue), it incurs the 
radar operating cost denoted as c„. Here Cj, > depends on the radar operating parameters. 

Stopping cost - Stochastic Observability: If the micro-manager chooses action Uk = 1 (stop), 
a stopping cost is incurred. In this paper, we formulate a stopping cost in terms of the stochastic 
observability of the targets, see also Q, [El- Define the stochastic observability of each target 
/ G {1, , . . . , L} as the mutual information 

/(4;4,) = a'M4)-/3'M4l4,). (4) 

In a' and are non-negative constants chosen by the designer. Recall from information 
theory [fT2l, that h(s[) denotes the differential entropy of target / at time k. Also h{s[\z[.i^) 
denotes the conditional differential entropy of target / at time k given the observation history 
z[.j^. The mutual information I{s[;z[.f^) is the average reduction in uncertainty of the target's 
coordinates s[ given measurements ^J.^. In the standard definition of mutual information a' = 

= 1. However, we are also interested in the special case when a' = 0, in which case, we are 
considering the conditional entropy for each target (see Case 4 below). 

Consider the following stopping cost if the micro-manager chooses action = 1 at time k: 

Cis,, z,..,) = -list; zl,,) + F({/(4, z[.,,); l^a}). (5) 

Recall a denotes the highest priority target. In (|5]), F(-) denotes a function chosen by the designer 
to be monotone increasing in each of its L — 1 variables (examples are given below). 
The following lemma follows from straightforward arguments in [fTlll . 

Lemma 1: Under the assumption of linear Gaussian dynamics ([T]) for each target I, the mutual 
information of target I defined in Q is 

Iis[,zl,) = aHog\Pi\- p'\og\Pl\, (6) 

where = E{(4 - E{4})(4 - E{si}f}, Pi = E{(4 - E{4|zi.,})(4 - ¥.{s{\zr..,}Y}. 
Here PI denotes the predicted (a priori) covariance of target / at epoch k given no observations. 
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It is computed using the Kalman predictor covariance update (jSj) for k iterations. Also, is 
the posterior covariance and is computed via the Kalman filter covariance update ([2]). ■ 
Using Lemma [I| the stopping cost C{-,-) in ^ can be expressed in terms of the Kalman 
filter and predictor covariances. Define the four-tuple of sets of covariance matrices 

p, = (p,^p,^p,-^p,-'^). (v) 

Therefore the stopping cost ([5]) can be expressed as 

C{Pk) = -a" log m + r log m + F ({«' log \Pi\ - log \Pl\;l^a}). (8) 

Examples: We consider the following examples of F(-) in ^ : 
Case 1. Maximum mutual information difference stopping cost: C{sk,zi-k) = ~H^k^ ^i-k) + 
m&xi^a I i^i^ ^i:k) ill which case, 

C{Pk) = -a'^loglP.I +riog|P,1 +max [aMog |P^| - log |Pi|] . (9) 

The stopping cost is the difference in mutual information between the target with highest mutual 
information and the target with highest priority. This can be viewed as a stopping cost that 
discourages stopping too soon. 

Case 2. Minimum mutual information difference stopping cost: C{sk,zi-k) = ^i-.k) + 

minima I i^i^ ^i:k) which case, 

C{Pk) = -a" log \P^\ + log \P^\ + min W log |P^| - /3' log |P^|1 . (10) 

The stopping cost is the difference in mutual information between the target with lowest mutual 
information and the target with highest priority. This can be viewed as a conservative stopping 
cost in the sense that preference is given to stop sooner. 

Case 3. Average mutual information difference stopping cost: C{sk,zi.k) = ~H^k^ ^i-.k) + 
Ei^a 4:k) in which case, 

C{Pk) = -a" log \P^\ + log \P^\ + W log l^fel - /3' log \Pi\\ ■ (11) 

This stopping cost is the difference between the average mutual information of the L — 1 targets 
(if a' and include a — 1) term) and the highest priority target. 

Case 4. Conditional differential entropy difference stopping cost: We are also interested in the 
following special case which involves scheduling between a Kalman filter and L—1 measurement- 
free Kalman predictors, see ifTSl . Suppose the high priority target a is allocated a Kalman 



October 20, 2011 



DRAFT 



9 



filter and the remaining L — 1 targets are allocated measurement-free Kalman predictors. This 
corresponds to the case where u"- = 1 and z/' = for / 7^ a in ([T]), that is, the radar assigns 
all its resources to target a and no resources to any other target. Then solving the sequential 
detection problem is equivalent to posing the following question: What is the optimal stopping 
time r when the radar should decide to start tracking another target? In this case, the mutual 
information of each target / 7^ a is zero (since Pi = Pi in So it is appropriate to choose 
= for / 7^ a in ([s]). Note from that when a' = 0, the stopping cost of each individual 
target becomes the negative of its conditional entropy. That is, the stopping cost is the difference 
in the conditional differential entropy instead of the mutual information. 

Discussion: A natural question is: How to pick the stopping cost @ depending on the target 
priorities? One can design the choice of stopping cost (namely. Case 1, 2 or 3 above) depending 
on the range of target priorities. For example, suppose the priority of a target is the negative of 
its mutual information. 

(i) If two or more targets have similar high priorities, it makes sense to use Case 2 since the 
stopping cost C{P) would be close to zero. This would give incentive for the micro-manager 
to stop quickly and consider other high priority targets. Note also that if multiple targets have 
similar high priorities, the radar would devote similar amounts of time to them according to 



the protocol in Section II-A, thereby not compromising the accuracy of the estimates of these 
targets. 

(ii) If target a has a significantly higher priority than all other targets, then Case 1 or 3 can be 
chosen for the stopping cost. As mentioned above. Case 1 would discourage stopping too soon 
thereby allocating more resources to target a. In comparison. Case 3 is a compromise between 
Case 1 and Case 2, since it would consider the average of all other target priorities (instead of 
the maximum or minimum). 



Since, as will be shown in Section IV the parametrized micro-management policies can be 
implemented efficiently, the radar system can switch between the above stopping costs in real 
time (at the macro-manager time scale). Finally, from a practical point of view, the macro- 
manager, which is responsible for assigning the priority allocations, will rarely assign equal 
priorities to two targets. This is due to the fact that the priority computation in realistic scenarios 
is based on many factors such as target proximity and heading relative to assets in surveillance 
region, error covariances in state estimates, and target type. 
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B. Formulation of Sequential Decision Problem 

With the above stopping and continuing costs, we are now ready to formulate the sequential 
detection problem that we wish to solve. Let ji denote a stationary decision policy of the form 

/i : Pjfc — )• Uk+i € {1 (stop) ,2 (continue) }. (12) 

Recall from (|7]) that is a 4-tuple of sets of covariance matrices. Let fi denote the family of 
such stationary policies. For any prior 4-tuple Pq (recall notation (|7])) and policy fi E n chosen 
by the micro-manager, define the stopping time r = mi{k : = 1}. The following cost is 
associated with the sequential decision procedure: 

J^{P) = E^{(r - l)c, + C{Pr)\Po = P}. (13) 



Here is the radar operating cost and C the stopping cost introduced in Section IH-A Also, 
W denotes expectation with respect to stopping time r and initial condition P. (A measure- 
theoretic definition of W, which involves an absorbing state to deal with stopping time r, is 
given in [|T4|). 

The goal is to determine the optimal stopping time r with minimal cost, that is, compute the 
optimal policy fi* E fi to minimize ([T3]). Denote the optimal cost as 



J^.(P) = inf J^(P). (14) 

The existence of an optimal stationary policy ji* follows from [|5| Prop. 1.3, Chapter 3]. Since c,y 
is non-negative, for the conditional entropy cost function of Case 4 in Section in-A[ stopping 
is guaranteed in finite time, i.e., r is finite with probability 1. For Cases (1) to (3), in general 
r is not necessarily finite - however, this does not cause problems from a practical point of 
view since the micro-manager has typically a pre-specified upper time bound at which it always 
chooses Uk = 1 and reverts back to the macro-manager. Alternatively, for Cases (1) to (3), if 
one truncates C (P) to some upper bound, then again stopping is guaranteed in finite time. 



Considering the above cost (13), the optimal stationary policy ^* E f-i and associated value 
function V{P) = Jfi*{P) are the solution of the following "Bellman's dynamic programming 
equation" lH (Recall our notation P = (P'^, P", P~", P~").) 

ViP) = mm{C{P), + (7^(P^ ^'^), £(P'^), 7^(p-^ z''^), £(P-"))] }, 
/i*(P) = arg min{C(P), + [l^(7^(P^ z"^), 7^(p-^ z-'^), £(P-"))] }, (15) 
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where 71 and C were defined in ^ and Here TZ{P~'^, z~"') denotes tlie Kalman filter 
covariance update for the L — 1 lower priority targets according to (|2]). Our goal is to characterize 
the optimal policy yU* and optimal stopping set defined as 



5stop = {(p^p^p-^p-'') : fi*{p'',p^,p-'',p- 



!}■ 



(16) 



In the special Case 4 of Section 



III-A 



when a' = 0, then ^^top = {(P", P'") : /i*(P", P"") = 1}- 



The dynamic programming equation (15) does not translate into practical solution method- 
ologies since the space of P, 4-tuples of sets of positive definite matrices, is uncountable, and 
it is not possible to compute the optimal decision policy in closed form. 



C. Main Result: Monotone Optimal Decision Policy 

Our main result below shows that the optimal decision policy ji* is a monotone function of 
the covariance matrices of the targets. To characterize /i* in the sequential decision problem 
below, we introduce the following notation: 

Let m denote the dimension of the state s in ([1]). (In the GMTI radar example m = 4). 
Let Ai denote the set of all m x m real-valued, symmetric positive semi-definite matrices. For 
P,QEAi define the positive definite partial ordering ^ as P ^ Q if Px > x^Qx for all 
X ^ 0, and P y Q if x^Px > x^Qx for x ^ 0. Define ^ with the inequalities reversed. Notice 
that [Ai, h] is a partially ordered set (poset). 

Note that ordering positive definite matrices also orders their eigenvalues. Let x = {xi, . . . , Xm) 
and y = (yi, . . . , y-m) denote vectors with elements in IR+. Then define the componentwise partial 
order on M"* (denoted by as x y (equivalently, y x) if Xi < yi for alH = 1, . . . , m. 

For any matrix P G A^, let Ap G denote the eigenvalues of P arranged in decreasing 
order as a vector. Note P h Q implies Ap \q. Clearly, [M™, ^i] is a poset. 

Define scalar function / to be increasin^if Xp Xq implies /(Ap) < /(Ag), or equivalently, 
if P ^ Q implies /(P) < f{Q)- Finally we say that f(P^"-) is increasing in P^° if /(■) is 
increasing in each component P' of P^", / ^ a. 

The following is the main result of this paper regarding the policy /i*(P", P"^, P^", P^"). 

^Throughout this paper, we use the term "increasing" in the weak sense. That is "increasing" means non-decreasing. Similarly, 
the term "decreasing" means non-increasing. 
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Theorem 1: Consider the sequential detection problem ([T3|) with stochastic observability 



cost @ and stopping set (16). 



1) The optimal decision policy jj,*{P"-, P°-, P^°-, P^'^) is increasing in P°^, decreasing in 
P", decreasing in P^", and increasing in P^" on the poset [A^,^]. Alternatively, 
/i*(P", P", P~", P~") is increasing in \pa, decreasing in \pa, decreasing in \p-a and 
increasing in Xp-a on the poset [M![!, ^^J. Here Xp-a denotes the L — 1 vectors of 
eigenvalues Xpi, I ^ a (and similarly for Xpa). 

2) In the special case when a' = for all / G {1, . . . , L}, (i.e., Case 4 in Section 



III-A 



where stopping cost is the conditional entropy) the optimal policy /i*(P",P^°) is 
increasing in P'^ and decreasing in P^'^ on the poset [Ai, >z]. Alternatively, /i*(P'', P^'^) 
is increasing in Xpa, and decreasing in Xp-a on the poset [W^, ^i]. ■ 



The proof is in Appendix |Bj The monotone property of the optimal decision policy fi* is useful 
since (as described in Section [TV] ) parametrized monotone policies are readily implementable 
at the radar micro-manager level and can be adapted in real time. Note that in the context of 
GMTI radar, the above policy is equivalent to the radar micro-manager opportunistically deciding 
when to stop looking at a target: If the measured quality of the current target is better than some 
threshold, then continue; otherwise stop. 

To get some intuition, consider the second claim of Theorem [1] when each state process has 
dimension m = 1. Then the covariance of each target is a non-negative scalar. The second claim 
of Theorem [T] says that there exists a threshold switching curve P" = g{P^), where g(-) is 
increasing in each element of P"^, such that for P" < g{P~"-) it is optimal to stop, and for 
P*^ > g[P^'^) it is optimal to continue. This is illustrated in Figure [T] Moreover, since g is 
monotone, it is differentiable almost everywhere (by Lebesgue's theorem). 

To prove Theorem [T] we will require the following monotonicity result regarding the Riccati 
and Lyapunov equations of the Kalman covariance update. This is proved in Appendix |Cj Below 
det(-) denotes determinant. 



Theorem 2: Consider the Kalman filter Riccati covariance update, 7^(P, z), defined in ^ 
with possibly missing measurements, and Lyapunov covariance update, C{P), defined in 
The following properties hold for P E Ai and z G W""" (where denotes the dimension 
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Fig. 1. Threshold switching curve for optimal decision policy fi*{P'^,P Claim 2 of Theorem [T| says that the optimal 
decision policy is characterized by a monotone increasing threshold curve <;(•) when each target has state dimension m = 1. 



of the observation vector z in ([T])) : 

(i) '^def(^''^ and (ii) '^'''^^(p)^''^ are monotone decreasing in P on the poset [M , h] ■ 



Discussion: An important property of Theorem [2] is that stability of the target system matrix F 
(see (24)) is not required. In target tracking models (such as ([!])), F has eigenvalues at 1 and is 



therefore not stable. By using Theorem [2} Lemma |4] (in Appendix |A]) shows that the stopping 
cost involving stochastic observability is a monotone function of the covariances. This monotone 
property of the stochastic observability of a Gaussian process is of independent interest. 

Instead of stochastic observability (which deals with log-determinants), suppose we had chosen 
the stopping cost in terms of the trace of the covariance matrices. Then, in general, it is not true 
that trace(7^(P, 2;)) — trace(P) is decreasing in P on the poset [Ai, >:]. Such a result typically 
requires stability of F. 

IV. Parametrized Monotone Policies and Stochastic Optimization Algorithms 

Theorem [T| shows that the optimal sequential decision policy jJ*{P) = arg inf^g^ J^(P) is 
monotone in P. Below, we characterize and compute optimal parametrized decision policies 
of the form jig* (P) = arg inf J^g (P) for the sequential detection problem formulated in 



Section III-B Here 9 E Q denotes a suitably chosen finite dimensional parameter and 6 is a 
subset of Euclidean space. Any such parametrized policy fig* (P) needs to capture the essential 
feature of Theorem [T| it needs to be decreasing in p-"^^p<^ and increasing in P",P"". In 
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this section, we derive several examples of parametrized policies that satisfy this property. We 
then present simulation-based adaptive filtering (stochastic approximation) algorithms to estimate 
these optimal parametrized policies. To summarize, instead of attempting to solve an intractable 



dynamic programming problem (15), we exploit the monotone structure of the optimal decision 



policy (Theorem [T]) to estimate a parametrized optimal monotone policy (Algorithm [T] below). 

A. Parametrized Decision Policies 

Below we give several examples of parametrized decision policies for the sequential detection 
problem that are monotone in the covariances. Because such parametrized policies satisfy the 
conclusion of Theorem [T| they can be used to approximate the monotone optimal policy of 
the sequential detection problem. Lemma [2] below shows that the constraints we specify are 
necessary and sufficient for the parametrized pohcy to be monotone implying that such policies 
^e*{P) are an approximation to the optimal policy /i*(-P) within the appropriate parametrized 
class 6. 

First we consider 3 examples of parametrized policies that are linear in the vector of eigen- 
values A (defined in Section |III-C[ ). Recall that m denotes the dimension of state s in ([T]). Let 



9^ and G = denote the parameter vectors that parametrize the policy /ig defined as 

1 (stop), if - e^'^Xpa + r^Apa + maxj^, 

2 (continue), otherwise. 



9^ Xpi — 9^ Xpi 



>1, 

(17) 



/ie(A^A-") 



1 (stop), if - 9''^\pa + (T^Xpa + min;^„ 

2 (continue), otherwise. 



/i,(A^A-'^) 



9^ Xpi — 9^ Xpi 



9^ Xpi — 9^ X pi 



> 1, 



(18) 



>1, 



(19) 



1 (stop), if - 9"' Xpa+ 9_^' Xpa+ 

2 (continue), otherwise. 

As a fourth example, consider the parametrized policy in terms of covariance matrices. Below 
9^ and (I G are unit-norm vectors, i.e, 9^^ 9^ = 1 and 9!^9^ = 1 for / = 1, . . . , L. Let W 
denote the space of unit-norm vectors. Define the parametrized policy fig, 9 E Q = U as 

f 1 (stop), if - ^"^P'^^" + ^'^^pa^a ^ y2 9^'^ P^9^ - 9^^ P^9^ > 1, 

/xe(P",p-") = <^ " " " "~ (20) 

I 2 (continue), otherwise. 
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The following lemma states that the above parametrized policies satisfy the conclusion of 
Theorem [T] that the policies are monotone. The proof is straightforward and hence omitted. 



Lemma 2: Consider each of the parametrized policies (|17|), (|18|), (|19|). Then 9\(l G 6 



is necessary and sufficient for the parametrized policy to be monotone increasing in P", P 



and decreasing in P "-.P"-. For (2Q), 9 E Q = U (unit-norm vectors) is necessary and sufficient 
for the parametrized policy fie to be monotone increasing in P", P^"^ and decreasing in P^°, P". 



Lemma [2] says that since the constraints on the parameter vector 6 are necessary and sufficient 



for a monotone policy, the classes of policies (17), (18), (19) and (20) do not leave out any 
monotone policies; nor do they include any non monotone policies. Therefore optimizing over 
for each case yields the best approximation to the optimal policy within the appropriate class. 
Remark: Another example of a parametrized policy that satisfies Lemma [2] is obtained by 



replacing Ax with logdet(X) in ([Tt]), ([Ts]), ([Tq]). In this case, the parameters , , 6'- , 6! are 
scalars. However, numerical studies (not presented here) show that this scalar parametrization is 
not rich enough to yield useful decision policies. 



B. Stochastic Approximation Algorithm to estimate 9* 

Having characterized monotone parameterized policies above, our next goal is to compute the 



optimal parametrized policy /i^. for the sequential detection problem described in Section IH-B 
This can be formulated as the following stochastic optimization problem: 



J, 



fig* 



inf JeiP^P^P-^P""), 

6^(3 



where Je(P^ P", P"", p-") = E^''{(r - l)c, + C{P^, P;, P^", P"" |Po = P, Pq = P}- (21) 
Recall that r is the stopping time at which stop action m = 1 is applied, i.e. r = inf{fc : = 1}. 



The optimal parameter 9* in (21 ) can be computed by simulation-based stochastic optimization 



algorithms as we now describe. Recall that for the first three examples above (namely, (17), 

V?. This constraint can be 



(18) and (19)), there is the explicit constraint that 9'' and 9^ e Q 



eliminated straightforwardly by choosing each component of 9^ as 9'-(i) = where 



. The optimization problem (21) can then be formulated in terms of this new unconstrained 



parameter vector 



In the fourth example above, namely (20), the parameter 9^ is constrained to the boundary set 
of the m-dimensional unit hypersphere U. This constraint can be eliminated by parametrizing 
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9^ in terms of spherical coordinates as follows: Let 

i—l m 

e^{l) =cos0'(l), e^{i) = Jjsin0'(j)cos(/>'(z), i = 2,...,m-l, e^{m) = JJsin0'(j). 

j=i j=i 

(22) 

where G M, i = 1, . . . ,m denote a parametrization of Then it is trivially verified that 



9'' E U. Again the optimization problem (21) can then be formulated in terms of this new 
unconstrained parameter vector 0' G M™. 

Algorithm 1 Policy Gradient Algorithm for computing optimal parametrized policy 
Step 1: Choose initial threshold coefficients 0o and parametrized policy fig^^. 

Step 2: For iterations n = 0, 1, 2, . . . 

. Evaluate sample cost J^M = (r - l)c^ + C(P;, P;, P^", P-^). 
Compute gradient estimate V0J„(/i<^) as: 

^ J _ Jn{<Pn + OOndn) - Jn{<Pn ' OOndn) ^ . . \ ^ith probability 0.5, 



2a;„ 



+1, with probability 0.5. 

Here Un = (^n+iy denotes the gradient step size with 0.5 < 7 < 1 and u; > 0. 
Update threshold coefficients 0„ via (where e„ below denotes step size) 

0n.+i = 0n - en+iV</,Jn(yU<^), e„ = e/(ra + 1 + s)^, 0.5 < C < 1, ande, s>0. (23) 



Several possible simulation based stochastic approximation algorithms can be used to estimate 
He* in pT] ). In our numerical examples, we used Algorithm [T] to estimate the optimal parametrized 
policy. Algorithm [T] is a Simultaneous Perturbation Stochastic Approximation (SPSA) algorithm 
ifTSll : see [fT6ll for other more sophisticated gradient estimators. Algorithm [T] generates a sequence 
of estimates 0„ and thus On, n = 1,2, ... , that converges to a local minimum 9* of ([2T[) with 



policy fie*{P)- In Algorithm [T] we denote the policy as since 6 is parametrized in terms of 
(p as described above. 

The SPSA algorithm [fTSll picks a single random direction d„ (see Step 2) along which 
the derivative is evaluated after each batch n. As is apparent from Step 2 of Algorithm [T] 
evaluation of the gradient estimate V(^J„ requires only 2 batch simulations. This is unlike 
the well known Kiefer-Wolfowitz stochastic approximation algorithm [[TSll where 2m batch 
simulations are required to evaluate the gradient estimate. Since the stochastic gradient algorithm 



(23 1 converges to a local optimum, it is necessary to retry with several distinct initial conditions. 
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V. Application: GMTI Radar Scheduling and Numerical Results 



This section illustrates the performance of the monotone parametrized policy (21 ) computed via 
Algorithm [T] in a GMTI radar scheduling problem. We first show that the nonlinear measurement 
model of a GMTI tracker can be approximated satisfactorily by the linear Gaussian model ([T]) 
that was used above. Therefore the main result Theorem [T] applies, implying that the optimal 
radar micro-management decision policy is monotone. To illustrate these micro-management 
policies numerically, we then consider two important GMTI surveillance problems - the target 
fly-by problem and the persistent surveillance problem. 

A. GMTI Kinematic Model and Justification of Linearized Model ([7]) 

The observation model below is an abstraction based on approximating several underlying pre- 
processing steps. For example, given raw GMTI measurements, space-time adaptive processing 
(STAP) (which is a two-dimensional adaptive filter) is used for near real-time detection, see 
O and references therein. Similar observation models can be used as abstractions of synthetic 
aperture radar (SAR) based processing. 

A modem GMTI radar manager operates on three time- scales (The description below is a 
simplified variant of an actual radar system.): 

• Individual observations of target / are obtained on the fast time-scale t = 1,2,.... The 
period at which t ticks is typically 1 milli-second. At this time-scale, ground targets can be 
considered to be static. 

• Decision epoch k = 1,2, ... ,r is the time-scale at which the micro-manager and target 
tracker operate. Recall r is the stopping time at which the micro-manager decides to stop 
and return control to the macro-manager. The clock-period at which k ticks is typically 
T = 0.1 seconds. At this epoch time-scale k, the targets move according to the kinematic 



model (24), (25) below. Each epoch k is comprised of intervals t = 1, 2, . . . , A of the fast 
time-scale, where A is typically of the order of 100. So, 100 observations are integrated at 
the t-time-scale to yield a single observation at the /c-time-scale. 

The scheduling interval n = 1, 2 . . . , is the time-scale at which the macro-manager operates. 
Each scheduling interval n is comprised of r„ decision epochs. This stopping time r„ is 
determined by the micro-manager. r„ is typically in the range 10 to 50 - in absolute time 
it corresponds to the range 1 to 5 seconds. In such a time period, a ground target moving 
at 50 km per hour moves approximately in the range 14 to 70 meters. 
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1) GMTI Kinematic Model: The tracker assumes that each target / G {1, . . . , L} has kinematic 
model and GMTI observations m, 



Fsi + Gw[ 



fc5 



with probability p^, 
with probability 1 — p\. 



(24) 
(25) 



Here 2;^ denotes a 3-dimensional (range, bearing and range rate) observation vector of target I 
at epoch time k and denotes the Cartesian coordinates and speed of the platform (aircraft) 
on which the GMTI radar is mounted. The noise processes and v\l^^/K are zero-mean 



Gaussian random vectors with covariance matrices and -R'(z/'), respectively. The observation 
in decision epoch k is the average of the z/'A measurements obtained at the fast time scale 



t. Thus the observation noise variance in (25) is scaled by the reciprocal of the target priority 



z/'A. In (24), (25) for a GMTI system. 
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a' 



(26) 



Q 



2 














2 
y 






2-^ '^y 

y 



arctan 



Recall that T is typically 0.1 seconds. The elements of h{s,^) correspond to range, azimuth, 
and range rate, respectively. Also ^ = {^x,^x,^y,ty) denotes the x and y position and speeds, 
respectively, and denotes the altitude, assumed to be constant, of the aircraft on which the 
GMTI radar is mounted. 

2) Approximation by Linear Gaussian State Space Model: Starting with the nonlinear state 
space model (24), the aim below is to justify the use of the linearized model ([T]). We start with 
linearizing the model (24) as follows; see WT\ Chapter 8.3]. For each target /, consider a nominal 
deterministic target trajectory sj, and nominal measurement where = Fs^ , z\. = h{s[,^). 
Defining s[ = s{ — s[ and = z^ — zl, a first order Taylor series expansion around this nominal 
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trajectory yields, 

4+1 = F~si + Gwi, 

~i _ \ ^sHsi, + Ri{sk, si, ^k) + 7=1^4' with probability p^^, 

I 0, with probability 1 — p^, 

where s', Oil < ilK^' " s^^^HCOi^^ - ^Oll and ( = 7s' + (1 - 7)5' for some 

7 G [0, 1]. In the above equation, Vs/i(s,0 is the Jacobian matrix defined as (for simplicity we 
omit the superscript / for target /), 



(27) 



r 

-Sy 








r 



Sx SySx+SySy 







5y 



52 + 52 + ^2. (28) 



where 5^ = x — 5^, = y — denotes the relative position of the target with respect to 
the platform and 5x, Sy denote the relative velocities. Since the target is ground based and the 
platform is constant altitude, is a constant. 



In (27), V /?.(■, ■) denotes the 3x4x4 Hessian tensor. By evaluating this Hessian tensor for 



typical operating modes and k < 50, we show below that 



|i?i(s',sS0ll 



< 0.02, 



< 0.06. 



(29) 



i|v.Ms',05'll l|v.M4,efc)ll 

The first inequality above says that the model is approximately linear in the sense that the ratio 
of linearization error -Ri(-) to linear term is small; the second inequality says that the model 
is approximately time-invariant, in the sense that the relative magnitude of the error between 
linearizing around sq and Sk is small. Therefore, on the micro-manager time scale, the target 
dynamics can be viewed as a linear time invariant state space model ([T]). 

Justification of ([29]): Using typical GMTI operating parameters, we evaluate the bounds 



in (29). Denote the state of the platform (aircraft) on which the radar is situated as po 

[Px,o,Px,Py.o,Py] = [—35000m, lOOm/s, —15000m, 20m/s]. Then the platform height is Pz 



pIq + Py QianOd, where 9d is the depression angle, typically between 10° to 25°. We assume 
a depression angle of 9^ = 15° below yielding p^ = 10203.2m. Next, consider typical behaviour 
of ground targets with speed 15m/s (54 km/h) and select the following significantly different 
initial target state vectors (denoted by superscripts a — e) 







100 3 40 7 

-70 5 -50 -6 







20 -4 200 1 

150 -15 10 



50 2 95 10 



(30) 
(31) 
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Now, propagate these initial states using the target model with T = 0.1s, cTx = (^y = 0.5, 
(Jr = 20m, (Jr = 5m/s, cTq = 0.5° with a true track variability parameter = 1.5 (used for true 



track simulation as and ay). Define (see (29) and recall that C = + (1 — 7)s for some 

7 e [0, 1]) 



Tables [l| to [in| show how D(-) and evolve with iteration k = 10,50, 100. The entries in the 
tables are small, thereby justifying the linear time invariant state space model ([1]). 

Remark: Since a linear Gaussian model is an accurate approximate model, most real GMTI 
trackers use an extended Kalman filter. Approximate nonlinear filtering methods such as sequen- 
tial Markov Chain Monte-Carlo methods (particle filters) are not required. 

B. Numerical Example 1: Target Fly-by 

With the above justification of the model ([1]), we present the first numerical example. Consider 
L = 4 ground targets that are tracked by a GMTI platform, as illustrated in Figure |2] The nominal 
range from the GMTI sensor to the target region is approximately f = 30km. For this example, 
the initial (at the start of the micro-manager cycle) estimated and true target states of the four 
targets are given in Table |IVj 

We assume in this example that the most uncertain target is regarded as being the highest 



priority. Based on the initial states and estimates in Table IV the mean square error values are, 
MSE(sJ) = 710.87, MSE(s2) = 222.16, MSE(s|j) = 187.37, and MSE(s^) = 140.15. Thus, 
target / = 1 is the most uncertain and allocated the highest priority. So we denote a = 1. 



The simulation parameters are as follows: sampling time T = 0.1s (see Section V-Al ); 
probability of detection = 0.75 (for all targets, so superscript I is omitted); track standard 
deviations of target model = Oy = 0.5m; measurement noise standard deviations ar = 20m, 
(Ta = 0.5°, <7j- = 5m/s; and platform states [pxi Pxi Pyi Py] ~ 

[10km, 53m/s, -30km, 85m/s]. We 
assume a target priority vector of u = [z/\ z/^, z/^, z/^] = [0.6,0.39,0.008,0.002]. Recall from 



(25) that the target priority scales the inverse of the covariance of the observation noise. We 



chose an operating cost of c^, = 0.8, and the stopping cost of C{P^'^) specified in (11), with 
constants a^'- -'^ = - = 0.05, = 5. The parametrized policy chosen for this example 
was iJ,g{P"-,P^"-) defined in ([20]). We used the SPSA algorithm (Algorithm [T]) to estimate the 



parameter 9* that optimizes the objective (21). Since the SPSA converges to a local minimum. 



several initial conditions were evaluated via a random search. 
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D{sio,^io) 


-D(S60,6o) 




So 


0.0010 


0.0052 


0.0104 




0.0009 


0.0049 


0.0104 




0.0010 


0.0059 


0.0119 


So 


0.0007 


0.0040 


0.0080 


So 


0.0010 


0.0053 


0.0112 



TABLE I 

Rate of change of Jacobian for various running times. 





S(6-i„,Si„,eiO,0.1) 


S(S50, 550,60, 0.1) 


-£^(S100, SlOO, ClOO, 0.1) 


sS 


0.00019091 


0.0010597 


0.01395 




0.00020866 


0.0011699 


0.014375 


So 


0.00019165 


0.0010813 


0.01453 




0.0002008 


0.0011065 


0.011844 


So 


0.0002294 


0.0012735 


0.015946 



TABLE n 

Ratio of second-order to first-order term of Taylor series expansion for a = 0.1. 





-B(sia, 510,^10, 0.8) 


E{S50, 850,^50,0.8) 


E{sioo, sioo, ^100, 0.8) 


So 


0.00019267 


0.0011104 


0.014211 


si 


0.00021104 


0.0012164 


0.014633 


s8 


0.00019447 


0.0011228 


0.014838 


So 


0.00020148 


0.0011386 


0.012178 


so 


0.00023083 


0.0013596 


0.016603 



TABLE III 

Ratio of second-order to first-order term of Taylor series expansion for a = 0.8. 



So = [130, 5.5, 84, 8.1]'^, sj = [100, 3, 40, 7f 

si = [-47.88, -2.38, 210.41, 0.418]'^, s§ = [-20, -4, 200, 1]'^ 

= [55.84, 2.37, 121.74, 9.56]'^, sg = [50, 2, 95, 10]"^ 

= [-55.13, 5.75, -68.41, -6.10]'^, Sq = [-70, 5, -50, -6]'^ 

TABLE IV 
Initial Target States and Estimates. 
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Figure [3] explores the sensitivity of the sample-path cost (achieved by the parametrized policy) 
with respect to probability of detection, pd, and the operating cost, c^. The sample -path cost 
increases with and decreases with p^. Larger values of the operating cost, c^,, cause the radar 
micro-manager to specify the "stop" action sooner than for lower values of Cy. As can be seen 
in the figure, neither the sample-path cost or the average stopping time is particularly sensitive 
to changes in the probability of detection. However, as expected, varying the operating cost has 
a large effect on both the sample-path cost and the associated average stopping time. 

Figure |4] compares the optimal parametrized policy with periodic myopic policies. Such 
periodic myopic policies stop at a deterministic pre- specified time (without considering state 
information) and then return control to the macro-manager. The performance of the optimal 
parametrized policy is measured using multiple initial conditions. As seen in Figure |4| the 
optimal parametrized policy is the lower envelope of all possible periodic stopping times, for 
each initial condition. The optimal periodic policy is highly dependent upon the initial condition. 
The main performance advantage of the optimal parametrized policy is that it achieves virtually 
the same cost as the optimal periodic policy for any initial condition. 



C. Numerical Example 2: Persistent Surveillance 

As mentioned in Section |l| persistent surveillance involves exhaustive surveillance of a region 
over long time intervals, typically over the period of several hours or weeks IfTSi and is useful in 
providing critical, long-term battlefield information. Figure|5]illustrates the persistent surveillance 
setup. Here f is the nominal range from the target region to the GMTI platform, assumed in 
our simulations to be approximately 30km. The points on the GMTI platform track labeled 
(1) - (72) correspond to location^ where we evaluate the Jacobian (28). Assume a constant 



platform orbit speed of 250m/s (or approximately 900km/h [|T9l ) and a constant altitude of 
approximately 5000m. Assuming 72 divisions along the 30km radius orbit, the platform sensor 
takes 10.4 seconds to travel between the track segments. Using a similar analysis to the Appendix, 
the measurement model changes less than 5% in /2-norm in 10.4s, thus the optimal parameter 
vector is approximately constant on each track segment. 

Simulation parameters for this example are as follows: number of targets L = 4; sampling time 
T = 0.1s; probability of detection pd = 0.9; track variances of target model = cTy = 0.5m; 
and measurement noise parameters = 20m, cr„ = 0.5°, a,. = 5m/s. The platform velocity is 

''The platform state at location n £ {1, 2, 72} is defined as p = [f cos(n -5°), —v sin(n ■ 5°), f sin(n • 5°), C cos(n -5°)] 
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Fig. 2. Target Fly-by Scenario. The GMTI platform (aircraft) moves with constant altitude and velocity at nominal range 
r = 30 km from the target region, (r is defined in l|28[)). Initial states of the four targets are specified in Table [rv) 




Fig. 3. Dependence of the sample-path cost achieved by the parametrized policy on the probability of detection, pd, and the 
operating cost, Ci,. The sample-path cost increases with the operating cost, but decreases with the probability of detection. Note 
the stopping times associated with the labelled vertices above. 
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(a) Sample-path cost (b) Magnified region 

Fig. 4. Plot of sample-path cost of periodic policies and the parametrized policy (thick-dashed line) versus initial conditions. 
These initial conditions are ordered with respect to the cost achieved using the parametrized policy for that particular initial 
condition. Notice that the sample-path cost is the lower envelope of all deterministic stopping times for any initial condition. 



now changing (assume a constant speed of v = 250m/s), unlike the previous example, which 
assumed a constant velocity platform. Since the linearized model will be different at each of 
the pre-specified points, (l)-(72), along the GMTI track, we computed the optimal parametrized 
policy at each of the respective locations. The radar manager then switches between these policies 
depending on the estimated position of the targets. 



We consider Case 4 of Section III-A where the radar devotes all its resources to one target, 
and none to the other targets. That is, we assume a target priority vector of v = [1,0, 0, 0]. In this 
case, the first target is allocated a Kalman filter, with all the other targets allocated measurement- 
free Kalman predictors. Since the threshold parametrization vectors depend on the target's state 
and measurement models, the first target / = a has a unique parameter vector, where targets 
Z 7^ a all have the same parameter vectors. Also, 6^ = 6, for all / G {1, 2, L}. 



We chose = 0.25, a"^ = = = 0, (3'^ = 0.25, (3^ = (3^ = (3^ = 1 in stopping cost ([TT} 
(average mutual information difference stopping cost). The parametrized policy considered was 
^e{P°'-i P^°') in (20). The optimal parametrized policy was computed using Algorithm [T] at each 
of the 72 locations on the GMTI sensor track. As the GMTI platform orbits the target region, 
we switch between these parametrized policy vectors, thus continually changing the adopted 
tracking policy. We implemented the following macro-manager: a = argmax^^^^ ^ |log|P'|}. 
The priority vector was chosen as t^" = 1 and u'' = for all / ^ a. Figure p| shows log- 
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determinants of each of the targets' error covariance matrices over multiple macro-management 
tracking cycles. 

VI. Conclusions 

This paper considers a sequential detection problem with mutual information stopping cost. 
Using lattice programming we prove that the optimal policy has a monotone structure in terms 
of the covariance estimates (Theorem [T]). The proof involved showing monotonicity of the 
Riccati and Lyapunov equations (Theorem [2]). Several examples of parametrized decision policies 
that satisfy this monotone structure were given. A simulation-based adaptive filtering algorithm 
(Algorithm ([T])) was given to estimate the parametrized policy. The sequential detection problem 
was illustrated in a GMTI radar scheduling problem with numerical examples. 

Appendix 

This appendix presents the proof of the main result Theorem [T| Appendix |A] presents the value 
iteration algorithm and supermodularity that will be used as the basis of the inductive proof. 
The proof of Theorem 1 in Appendix |B] uses lattice programming [20] and depends on certain 
monotone properties of the Kalman filter Riccati and Lyapunov equations. These properties are 
proved in Theorem |2] in Appendix |Cj 



A. Preliminaries 



We first rewrite Bellman's equation (15) in a form that is suitable for our analysis. Define 

c(P) = - c(P) + c{n{p\z'^)x{P'').n{p-\z-^),c{p-'^))q,aq,-a, 

V{P) = V{P) - C{P), where P = (P^ P", P"", P""), (32) 

if^V0, , ^ 
1 — p^^, otherwise, 



In (32) we have assumed that the missed observation events in (25) are statistically independent 
between targets, and so q^-a = Yli^a ■ Actually, the results below hold for any joint distribution 
of missed observation events (and therefore allow these events to be dependent between targets). 



For notational convenience we assume (25) and independent missed observation events. 



October 20, 2011 



DRAFT 



26 

























1 1 




































































































































































































































































































































20- 


















































V 






















ai 




























































































































te- 
























2) 








- 




















"arge 


Region 


































J 






































































(1 








□ 








Q 








Q 



















2 























































































































































































— ^ 






r 


\ 






























































































































































































— m 
























































































































































Track of 
Plajfofm 




I 









































Fig. 5. Representation of tlie persistent surveillance scenario in GMTI systems. The GMTI platform (aircraft) orbits the target 
region in order to obtain persistent measurements as long as targets remain within the target region. The nominal range from 
the platform to the target region is assumed to be 30km. 
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Fig. 6. Plot of log-determinants for each target over multiple scheduling intervals. On each scheduling interval, a Kalman filter 
is deployed to track one target and Kalman predictors track the remaining 3 targets. The bold line corresponds to the target 
allocated the Kalman filter by the micro-manager in each scheduling interval. Initially a Kalman filter is deployed on target 
I = 1. Data points marked in red indicate missing observations. 
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Clearly V"(-) and optimal decision policy satisfy Bellman's equation 

V{P) = mm{Q{P,l),Q{P,2)}, /i*(P) = argmin{Q(P,M)}, (33) 

ue{i,2} 

Q(p,i) = o, Q{p,2) = c{P)+ J2 v{{n{p\z),c{P''),np-%^-n,^P-''))qz^<iz-'^- 

Our goal is to characterize the stopping set defined as 

5stop = {P ■■ Q{P,l) < Q{P,2)} = {P : Q(P,2) > 0} = {P : ^l*{P) = 1}. 

Since the argmin function is translation invariant (that is, argmin„ /(P, u) = argmin„(/(P, u) + 
h(P)) for any functions / and h), both the stopping set 5stop and optimal policy /i* in these new 



coordinates are identical to those in the original coordinate system ( [T6| ), ( |T5) ). 
Value Iteration Algorithm: The value iteration algorithm will be used to construct a proof of 
Theorem [T] by mathematical induction. Let k = 1,2,..., denote iteration number. The value 
iteration algorithm is a fixed point iteration of Bellman's equation and proceeds as follows: 

Vo{P) = -C{P), \4+i(P)= min Q,+i(P,m), 

ue{i,2} 

/ifc+i (P) = arg min Qk+i (P, u) , where Qk+i (P, 1) = 0, 

"£{1,2} 

Q,+i(P,2) = C(P) + 5^ l^fc (7^(P^^'^),7^(p-^^-'^)) g,.g,-.. (34) 

Submodularity: Next, we define the key concept of submodularity fSOl. While it can be defined 
on general lattices with an arbitrary partial order, here we restrict the definition to the posets 
[■M., ^] and [7?.™, ^/], where the partial orders ^ and were defined above. 

Definition 1 (Submodularity and Supermodularity / |201/ ).' A scalar function Q{P,u) is sub- 
modular in P"^ if 

Q(P^ P", P~", P-", 2)-Q(P'', P^ P-'', P-", 1) < Q(P", P'*, P-'*, P-", 2)-Q(P", P", P-'', P-'^, 1), 

for P" ^ P". - , u) is supermodular if — ■,u) is submodular. A scalar function Q{P°', P", P^", P'", u) 
is sub/supermodular in each component of P ° if it is sub/supermodular in each component P', 
I 7^ a. An identical definition holds with respect to Q(A", A~",m) on [7^™, ^z]. 

The most important feature of a supermodular (submodular) function f{x, u) is that argmin^ /(x, m) 
decreases (increases) in its argument x, see [20|. This is summarized in the following result. 

Theorem 3 (/|2Q1/].- Suppose Q{P"', P", P"'*, P~", m) is submodular in P", submodular in P^", 
supermodular in P~"^ and supermodular in P". Then there exists a fi*{P°-, P"-, P'"', P~'^) = 
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argmin^g|^ 2} Q(-P", -P", -P~°, m), that is increasing in P"^, decreasing in P", increasing in 
P^" and decreasing in P^°. 

Next we state a well known result (see [[2T]| for proof) that the evolution of the covariance 
matrix in the Lyapunov and Riccati equation are monotone. 

Lemma 3 (^21^): 7l{-) and £(■) are monotone operators on the poset [A^,^]. That is, if 
Pi h P2, then C{Pi) >z C{P2) and for all z, 7^(Pl, z) h 7^(P2, z). 

Finally, we present the following lemma which states that the stopping costs (stochastic 
observability) are monotone in the covariance matrices. The proof of this lemma depends on 
Theorem [2[ the proof of which is given in Appendix |C] below. 



Lemma 4: For C(P) in Case 1 (|9|), Case 2 (|10|) and Case 3 ([llj), the cost CiP", P'^, P"", P-^) 
defined in (32) is decreasing in P", P^", and increasing in P~°, P". (Case 4 is a special case 
when ai = for all / G {1, . . . , L}.) 

Proof: For Case 1 and Case 2 let /* = argmax^^a [a^og |P^| — /3' log |P^|] or /* = 



argminj^a [a' log |P^| — log \ Pl\\, respectively. From (32) with | ■ | denoting determinant, 
C(P) . log E'o. ^^.(^")+«" log log 

(35) 



For Case 3, 

a, l-^l^")! oaY-T |7^(p^ ;2'^) | >^ |/:(p')l |7^(p^ 

C(P) = c.-a^ log 5^ log " ip;. 



(36) 



Theorem shows that ^-^jpr^ and ^''^^^^i^ are decreasing in P^ and P' for all /. 



B. Proof of Theorem [7] 



Proof: The proof is by induction on the value iteration algorithm (34). Note Vo(P) defined 
in (34) is decreasing in p<^^p-<^ and increasing in p-'^^p"^ via Lemma [3} 

Next assume Vfc(P) is decreasing in P'^^p-"^ and increasing in P^°^,P'^. Since TZ{P"-,y), 
C{P"'), Tl{P^"',y^'^) and C{P^"') are monotone increasing in P", P'*, P^" and P^", it follows 
that the term Vk (7^(P^ z"), C{P''),n{P-'', z'"), CiP-")) q^aq^-a is decreasing in P^ p-" and 



increasing in p-'^^p^ in (34). Next, it follows from Lemma UJ that C{P"- , , P"- , P'"- , P''') is 



decreasing in P'^^p-"^ and increasing in P''^,P'^. Therefore from ( |34[ ), Qfc+i(P, 2) inherits this 
property. Hence Vfe+i(P'', P^") is decreasing in P'^, P^" and increasing in P^", P". Since value 
iteration converges pointwise, i.e, Vfc(P) pointwise l^(P), it follows that V{P) is decreasing in 
pa p-a increasing in P~", P". 
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Therefore, Q{P, 2) is decreasing in P", and increasing in P~"^, P". This implies Q{P, u) 
is submodular in {P"-,u), submodular in (P^'^,u), supermodular in {P^"-,u) and supermodular 
in {P"',u). Therefore, from Theorem |3} there exists a version of n*{P) that is increasing in 
pa p~a ^j^^ decreasing in P^^,P^. ■ 

C. Proof of Theorem |2] 
We start with the following lemma. 

Lemma 5: If matrices X and Z are invertible, then for conformable matrices W and Y , 

det(Z)det(X + YZ'^W) = det(X)det(Z + WX-^Y). (37) 
Proof: The Schur complement formulae applied to (_^^) yields, 

/ YZ-^ \ / X + YZ-^W \ / / 
0//V Z J \ -z-^w I ^ 

I ^ \ ( ^ \ ( ^ ^'^^ 

-WX~^ / / V Z + WX-^Y / V / 



Taking determinants yields (37). 



Theorem^i): Given positive definite matrices Q and Pi >- P2 and arbitrary matrix F, 

det(£(P)) . • • o • 1 ,1 det(PPiP^ + g) det(PP2P^ + Q) 
IS decreasmg m P, or equivalently, < 



Proof Applying ^ with [X, Y, W, Z] = [Q, F, F^, p-^], 

^''^^ffjp^'^ = det(P-^ + P^Q-^P)det(Q). (38) 

Since Pi y P2 y 0, then -< P{^ -< P^^ and thus -< P{^ + F^Q-^F -< P^^ + F^Q-^F. 
Since positive definite dominance implies dominance of determinants, it follows that 

det(Pi-i + F^Q-^F) < det(P2-^ + F^Q-^F). 



Using (38), the result follows. 
Theorem 

decreasing in P. That is for Pi >- P2 



2 'Uy. Given positive definite matrices Q, R and arbitrary matrix F, ^^^'^^^j^p^^^ is 



Aa{FPiF^ - FPiH^{HPiH^ + R)-^HPiF'^ + Q) 



det(P] 



det(PP2P^ - FP^H^jHP^H^ + RY^HP^F^ + Q) 
^ det(P2) ^^^^ 
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Proof: Using the matrix inversion lemma {A+BCD)-^ = A-^-A-^B{C-^+DA-^B)-^DA-\ 

{P-^ + H^R-^H)-^ = P- PH^{HPH^ + ny^HP, 

F{P-^ + H^R'^ny^F'^ + Q = FPF'^ - FPH^{HPH^ + Py^HPF'^ + Q 

det{FPF^ - FPH'^iHPH^ + Py^HPF^ + Q) = det{Q + F{P~^ + H^R-^Hy^F^). 

(40) 



Applying the identity (37) with [X, Y, W, Z] = [Q, F, F^, P"^ + H^R'^H] we have 



det{P~^+H^R-^H)det{Q+F{p-^+H^R~^Hy^F^) = det{Q)det{p-^+H^R-^H+F^Q~^F). 

(41) 



Further, using ^ with [X, Y, W, Z] = [p-\H^, H, R], we have, 

det(p-i + H^R-^H) = det(p-i)det(P + HPH^)/det{R) (42) 



Substituting (42) into (41) 



det(P"^)det(P + HPH^)det{Q + F{p-^ + H^R-^Hy^F^) 

= det(Q)det(p-i + H^R-^H + P^g-ip)det(P) (43) 

det(g + P(P~i + H^R-^Hy^F^) _ det(g)det(p-i + H^R~^H + P^g-^P)det(P) 

det(P) ~ det(P + HPRT) ' ^ ' 

From (|44]) and ( [40] ) 

det(PPP^ - FPH^{HPH^ + Py^HPF^ + Q) _ det(g)det(p-i + H^R-^H + P^g-ip)det(P) 
det(P) ~ det(P + HPRT) 

(45) 

We are now ready to prove the result. Since Pi ;^ P2 ;^ 0, 

. ^ Py + H'^R-^H + F'^Q-^F ^ Py + H^R-^H + F^Q-^F, 

. det(Pf 1 + H^R-^H + F'^Q-^F) < dti^Py + H^R-^H + F^Q-^F), 

. R + HPiH^ y R + HPiH^ >- 0, 

. det(P + HPiH^) > det(P + HPiH^). 
Therefore, ([39]) follows from the following inequality 



det(g)det(Pf 1 + H^R-^H + P^g^ip)det(P) det(g)det(P2-^ + H^R-^H + P^g-ip)det(P) 
det(P + HPiRT) ^ det(P + HP2HT) 
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